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Two algorithms for solving the (symmetric distance) traveling salesman 
problem have been programmed for a high-speed digital computer. The 
first produces guaranteed optimal solution for problems involving no more 
than 13 cities; the time required (IBM 7094 II) varies from 60 millisec- 
onds for a 9 -city problem to 1.75 seconds for a 13-city problem. The sec- 
ond algorithm produces precisely characterized, locally optimal solutions 
for large problems (up to 145 cities) in an extremely short time and is based 
on a gp.np.mljf/'.uri.sit.ir. ap proach believed to be of general avvlicabilitv to 
various optimization problems. The average time required to obtain a 
loc ally ovtimal solu tion is under 30n z microseconds where n is the num- 
~ber of cities involved. Repeated runs on a problem from random initial 
tours result in a high probability of finding the optimal solution among 
the locally optimal solutions obtained. For large problems where many 
locally optimal solutions have to be obtained in order to be reasonably as- 
sured of having the optimal solution, an efficient reduction scheme is in- 
corporated in the program to reduce the total computation time by a sub- 
stantial amount. 

I. INTRODUCTION 

The traveling salesman problem may be stated as follows: "A sales- 
man is required to visit each of the n given cities once and only once, 
starting from any city and returning to the original place of departure. 
What route, or tour, should he choose in order to minimize the total 
distance traveled?" Instead of distance, other notions such as time, cost, 
etc., can be considered as well. In this paper, we shall use the term 
"cost" to represent any such notion. 

Mathematically, the problem may be stated in the following two 
equivalent ways: 

(1) Given a "cost matrix" D = (di } ), where dy = cost of going from 
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city i to city j, (i, j = 1, 2, • • • , »), find a permutation P = (ii , U , U , 
• • • , i„) of the integers from 1 through n that minimizes the quantity 

d hia + d i2 i 3 + • • • + d inh . 

(2) Given a "cost matrix" D as above, determine xy which minimizes 
the quantity Q = ^udifdj subject to 

(a) xu = 

(b) Xij = 0, 1 

(c) £ Xij = 2 Xij = 1 

i i 

and 

(d) for any subset S = [it , i% , • • ■ , 2 r } of the integers from 1 through 



•^U'2 I ^«2»3 T " * " T 2-1,— 1»" "T ^Ir 1 '! ] <; 



<r for r < n 
n for r = n. 



The second version is a formulation of the traveling salesman problem 
as a linear program and hence the problem may be solved as such. 
However, the number of constraints becomes astronomical even for 
relatively small n. Dantzig, Fulkerson, and Johnson 1 have given a 
linear-programming approach to the symmetric (dy = dji) traveling 
salesman problem that considers only part of the required linear con- 
straints and have found the technique effective in several cases. 

Since we have only a finite number of possible tours to consider 
(|(n — 1)!), the problem is really to obtain a reasonably efficient al- 
gorithm for finding an optimal solution. Certain algorithms employing 
branch and bound techniques have been tried and appear to be efficient 
for some problems; however, the computation time involved is un- 
predictable and increases very rapidly with n. Numerous authors have 
tried different techniques to obtain "near-optimal" solutions by a 
series of approximations and for specific problems were able to prove 
optimality of their solutions. For any conjectured optimal solution, 
however, the proof for optimality is dependent upon inspectional work 
which is usually heuristic in nature, and is certainly highly problem 
dependent, thus making it difficult to program for a computer. 

Two algorithms for solving the (symmetric distance) traveling sales- 
man problem have been programmed for a high-speed digital computer. 
The first algorithm, called /c-length string optimization, is discussed in 
detail in Appendix A. It produces guaranteed optimal solutions for 
problems involving no more than 13 cities; the time required* varies 

* IBM 7094 II. 
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from 60 milliseconds for a 9-city problem to 1.75 seconds for a 13-city 
problem. The algorithm is a slight modification of that given by Held 
and Karp. 2 However, we achieve a significant reduction in computation 
time by faking advantage of the fact that the distance matrix is sym- 
metric. Due to the limitation on the size of the problem it can effectively 
handle, we find that it is not as useful as the second algorithm which 
we shall discuss below. The second algorithm (implemented by a copy- 
righted program) produces precisely characterized locally optimal solu- 
tions for large problems (up to 145 cities) in an extremely short time 
and is based on a general heuristic approach believed to be of general 
applicability to various optimization problems. The average time re- 
quired per locally optimal solution is under 30n 3 microseconds where 
n is the number of cities involved. Repeated runs on a problem from 
random initial tours result in a high probability of finding the optimal 
solution among the local optimum solutions obtained. For large problems 
where many locally optimal solutions have to be obtained in order to 
be reasonably assured of having the optimal solution, an efficient re- 
duction scheme is incorporated in the program to reduce the total 
computation time by a substantial amount. 

II. X-OPTIMALITY 

Before we describe the second algorithm, we first introduce the con- 
cept of X-optimality. This serves to classify tours into a descending 
chain of classes possessing increasingly stronger necessary conditions 
for optimality. As we shall see later, this forms the basis for the con- 
struction of our second algorithm. 

From the point of view of graph theory, we may consider the n cities 
as vertices of a nondirected complete graph, and the entries d w of the 
distance matrix real numbers assigned to links u.-y connecting city i to 
city j. A permutation P = (ii , it , • • • , i n ) representing a tour may be 
considered as a collection of n links w,-,,- 2 , w,- 2 ,-, , • • • , w, nll forming a 
Hamiltonian circuit, and the quantity C = c?,, ,- 2 + d,- 2 ,- 3 + • ■ • + d inil 
the cost associated with the tour. 

For convenience, let us say a link u# is admissible if there is an optimal 
tour containing it. All other links are inadmissible. A set of links is said 
to be an admissible set if there exists an optimal tour containing all the 
links in the set. The index of a tour is the maximum number of links 
which the tour has in common with an optimal tour, i.e., the maximum 
number of links in the tour which form an admissible set. 

We define X-optimality of tours as follows: 

Definition: A tour is said to be \-optimal (or simply X-opt) if it is im- 
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possible to obtain a tour with smaller cost by replacing any X of its 
links by any other set of X links. 

We list below a few theorems concerning X-optimality. Proofs are 
omitted since they are all fairly obvious. In Appendix D some interest- 
ing unsolved problems concerning X-optimality will be discussed. 

Theorem 1: Let T be a tour which is X optimal with index k. Then either 
T is optimal or k < n — A. 

Theorem 2: Any tour is 1 -optimal. 

Theorem 3: The following properties of a tour are equivalent: 

(a) The tour is 2-optimal. 

(b) The tour is optimal relative to inversion; where by inversion we mean 
reversing the order of a set of neighboring cities in the tour. 

(c) The tour does not intersect itself (in a generalized sense for non- 
Euclidean distance matrices). 

Theorem 4: A lour is optimal if and only if it is n-optimal. 

Theorem 5: Let C\ denote the set of all \-optimal tours, then C\ 3 C 2 3 
• • • z> C n • In other words, a \-optimal tour is also \' -optimal for X' < X. 

Note that the well-known theorem, which states that an optimal tour 
does not intersect itself, is contained in Theorems 3, 4, and 5. (C„ C C2). 

III. THE SECOND ALGORITHM 

In his paper, "A Method for Solving Traveling Salesman Problems", 
G. A. Croes 3 applied a simple transformation, called "inversion" to 
transform a trial solution into another with smaller costs, iterating until 
no further inversions are desirable. Then he gave a method for deriving 
the optimal solution from the inversion free solution obtained. He pointed 
out, however, that the final adjustment procedures are difficult to 
program for a computer as they involve mostly inspectional work. For 
large problems, it seems doubtful whether a human being can exhaus- 
tively carry the computations through or even whether a computer 
program based upon those techniques will be feasible or efficient. 

Putting aside the final adjustment procedures, we ask if there are 
other simple transformations which are stronger than the inversions. 
Since the inversion-free tours are just the 2-opt tours, we decided to 
write a computer program to produce 3-opt tours. As it turns out, the 
3-opt tours are very much stronger than the inversion-free tours in the 
sense that (1) every 3-opt tour is inversion free, (2) the average tour 
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cost is considerably less, and (3) the probability of an optimal solution 
showing up as a 3-opt tour is significantly higher than that of 2-opt 
tours. Experimenting with a program producing 4-opt tours, we also 
find that we spend a great deal more computation time in producing 
4-opt tours while not increasing noticeably the probability that it is 
optimum. Computational results on many problems support the claim 
that we have found a really efficient way of attacking the traveling 
salesman problem by generating as many 3-opt tours from random 
initial tours as we can afford time-wise and then choosing the best among 
the 3-opt solutions as our "conjectured solution". The merits of this 
heuristic approach based on probability as compared to the usual 
approach of using further complicated refinements to transform locally 
optimal solutions into global optima will be discussed later in the paper. 

IV. THE GENERAL APPROACH 

Since we have found that a 3-opt tour has a nontrivial probability of 
being optimal, we make, for a given problem, an estimate of this proba- 
bility* (of success) P a and produce from our program k 3-opt tours 
(not necessarily distinct) from random initial starts. We choose k so 
that 1 — (1 — P a ) k is as close to 1 as we desire. Since the running time 
in obtaining each 3-opt tour is reasonable (25 to 30n 3 microseconds 
where n is the number of cities), we can indeed afford the luxury of 
making k large. For example, a 30-city problem can reasonably be 
expected to be "solved" in 75 seconds with k = 100. At any rate, the 
best of the k locally optimal solutions, even though it may not actually 
be the best, will be close enough to the best solution as to offer a satis- 
factory answer for most practical problems arising in actual applica- 
tions. Also, a large set of "satisfactory" locally optimal solutions may 
give an engineer more flexible choice of a solution that he may use satis- 
fying further nonessential but nevertheless desirable features which 
may be hard to program. 

When the number of cities involved is rather large, say >30, the 
number of locally optimal solutions that needs to be generated in order 
to be reasonably assured of having the optimal solution will be very 
large, as is expected. Incorporated in the program, is a reduction scheme 
whereby information gained from an initial set of locally optimal solu- 
tions is used to reduce the size of the problem, thereby decreasing sub- 



* This probability, in general, depends on the size and nature of the problem. 
From the statistics collected after running many problems, we shall give a 
heuristic estimate in Appendix C. 
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stantially the time involved in generating additional locally optimal 
solutions. 

A brief description of the computer program to produce 3-opt tours 
is given in Appendix B. We mention here an alternate characterization 
of a 3-opt tour which is more graphic and which we use in our program. 
A tour T is said to be optimal relative to inversion and insertion if, for 
every A-, no section of k consecutive cities in T, say (i a+ i , i a+ i , • • • , i a +k), 
can be removed from T and reinserted (as is, or inverted) between any 
two consecutive remaining cities to produce a tour of lesser cost. We 
prove the following: 

Theorem 6: A tour T is optimal relative to inversion and insertion if and 
only if it is 3-optimal. 

Proof: A tour T is not 3-optimal, if and only if there exists 3 links, say 
un , Uki , Um„ which may be exchanged by 3 other links say « tm , ttyj 
and Unk , (as in Fig. 1, other possibilities are similar) to form a tour of 
lesser cost. From Fig. 1, we see that the section from m to I may be 
inserted between i and j and hence the tour is not optimal relative to 
inversion and insertion. 

V. GENERAL DESCRIPTION OF THE METHODS USED TO PRODUCE 3-OPT 
TOURS 

In the process of obtaining 3-opt tours from a random initial tour, the 
basic operation consists of determining whether any section of length k 
in the present tour can be inserted (as is, or inverted) between two other 
neighboring cities so as to decrease the tour cost. This was proved in 
Section IV (Theorem 6) to be equivalent to exchanging three links in 




Fig. 1 — Proof of Theorem 6. 
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the tour for three other links. Once an improvement is found and made, 
we treat the resulting tour as our initial tour and iterate the process. 
The process terminates with a locally optimal (3-opt) tour when im- 
provement cannot be further achieved. We call the portion of the 
computation from the time we made the last improvement to the veri- 
fication that no further improvement can be achieved by this algorithm 
the "check-out period." The time involved in the "check-out period" 

is proportional to ( J. For different random initial tours however, the 

number of improvements may vary and the time it takes to find each 
improvement also varies. This accounts for some variation in the compu- 
tation time for the individual locally optimal solutions. However, it 
turns out that the average computation time for a set of 10 or more 
cases is uniformly around 50w 3 microseconds, which is further reduced 
by the techniques discussed below. 

From our experiments, we find that links used in a locally optimal 
solution are often exchanged in and out many times in the improvement 
process, and this tends to increase the computation time considerably. 
Two methods are incorporated in the program to reduce this. First, 
after each improvement, the improved tour (fa', fa', • • • , t n ') is further 
perturbed by a rotation (see Appendix B) so as to prevent the new links 
just inserted from being removed again too soon. Secondly, the following 
special feature making use of locally optimal tours previously obtained 
is used: After m 3-opt tours T\ , To , • • • , T m (m = 1, 2, ■ • •) are gen- 
erated, consider the set S consisting of the union of the links found in 
T\ , T2 , • • • , T m . In the process of obtaining the in + 1th 3-opt tour, 
we systematically break off 3 links u lltn , u lktk+1 , u tjtj+l in (fa , fa , • • • , t„) 
to see if they can be replaced by 3 other links so as to form a tour of 
lesser cost. In the algorithm (see Appendix B for details), u tltn is held 
fixed while k goes from 1 to n — 3, coupled with each j going from 
k + 1 to n — 1. We skip this sequence of tests for possible improve- 
ments altogether if w Mn e S, and proceed as if all such tests for possible 
improvements fail. The tour (fa , fa , ■ ■ ■ , t n ) is "rotated" by the substi- 
tution (tn , h , to. , • • • , t„-i) —> (fa , fa , • • • , t„) and the improvement 
process continues. When no further improvements can be made relative 
to this special feature, we obtain a tour which we call an "almost 3-opt 
tour." This almost 3-opt tour is then put through the algorithm without 
the special feature to obtain a final 3-opt tour. 

This process may seem roundabout, but in effect it results in post- 
poning the replacements of links which have occurred in other locally 
optimal tours until other replacements have been tried. Actually, an 
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almost 3-opt tour often has most, if not all, of its links in S so that its 
check-out period is almost negligible. The time required to obtain the 
final 3-opt tour is also usually quite small. The over-all effect is that we 
are able to find improvements much sooner and also the number of 
improvements made in reaching the locally optimal solution is signifi- 
cantly decreased. As a result, for a set of about 10 locally optimal solu- 
tions, we are able to reduce the total computation time by at least 40 
percent. 

VI. THE REDUCTION PROCEDURE 

After a certain number, say r, of locally optimal solutions are ob- 
tained, consider the set / of links common to all those locally optimal 
solutions. Intuitively, we feel that since the r 3-opt tours are produced 
from randomly generated initial tours, any link in / should have a very 
high probability of belonging to an optimal solution when r is reasonably 
large. Further, for each problem certain simple features (like some 
obvious links connecting two cities) of the optimal solution should be 
reflected in a majority of 3-opt tours so that we expect the set / to be 
frequently nonempty. Using /, we can reduce the size of the problem 
as follows: A link uy is called basic if w,y is in /, and a city i is removed 
if there are two basic links uy , Wy incident at i. The procedure can of 
course remove many strings of cities at the same time. If w,-,-, , w t -, t -, , 
• • • , Ui j' are all basic and no other basic links are connected to cities 
j and j', the string of cities i x , ii , • • • , i P is removed. We call cities j 
and j' corresponding terminals. If a total of t cities are removed, we then 
solve for 3-opt tours in the remaining n — t cities. By reassigning artifi- 
cial link costs to all links u,/, where j and/ are corresponding terminals, 
we make sure that j and j' will be neighboring cities in the solution to 
the reduced problem, and hence the string of cities between j and j' 
which were removed can be reinserted accordingly. This process can be 
iterated as many times as we please. Note that we tacitly assume an 
optimal tour contains the cities j, i\,U, • • • , i p , j' as a substring. If 
this is the case, we say that the reduction is proper. Otherwise, the 
reduction is improper and the optimal tour will be missed in all future 
3-opt tours generated in the same run. However, even if this should 
happen (rare if r is large or n ^ 30), the best tour obtained usually has 
a tour cost differing from the optimal tour by an extremely small amount. 
For large n, several independent runs should be made to guard against 
the possibility of an improper reduction. 

When the number of cities involved is fairly large, this reduction 
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procedure is very effective in reducing computation time required to 
obtain the optimal solution as the results given in the next section show. 
The reduction procedure also provides a large variety of hard problems 
(involving a smaller number of cities) from which we can learn a great 
deal statistically about the characteristics of 3-opt tours. We consider 
those problems harder than randomly chosen problems because they 
retain essentially the heart of the original larger problem. The proba- 
bility that a 3-opt tour generated from a reduced problem is optimal 
relative to the reduced problem is usually lower than the mean proba- 
bility for random problems of the same size. However, in spite of the 
fact that improper reduction may occur, the probability that a 3-opt 
tour generated from a reduced problem is optimal relative to the original 
problem cannot be decreased. Since problem-size is reduced, more 3-opt 
tours can be obtained in a given time, and thus the probability of finding 
the optimal solution in a given amount of computation time is greatly 
increased. 

VII. COMPUTATIONAL RESULTS 

7.1 Twenty and Fewer City Problems 

Six problems whose cities are points (.r, , y<) generated randomly in 
a 100 X 100 square and of sizes ranging from 12 to 20 cities were tested. 
For these cases, 5 to 10 3-opt tours were generated per problem. It 
turned out that in each problem all 3-opt tours generated were identical, 
and the costs of the solutions obtained in all six problems were as good 
as, or better than, solutions obtained by other methods (3 of which are 
known to be optimum). It appears that randomly generated problems 
are easy to solve by our method. 

Forty 3-opt tours were generated for the 20-city problem of G. A. 
Croes, 3 which has a known optimal solution with cost 246. Reduction 
was used with r = 8. Successive stages of reduction reduced the number 
of cities from 20 to 11, 11, 11, 11, (i.e., no further reduction produced 
after the second round). The optimal tour appeared 13 times out of 40 
and the total computation time used for the 40 3-opt tours was 3.43 
seconds. This 20-city problem seems "harder to solve" than most 20-city 
problems we have encountered. 

Many more problems with sizes around 20 cities obtained from the 
reduction process of larger problems were investigated. Judging from 
all the results, we believe that we can "solve" any 20 or fewer city 
problem by our method in (very conservatively) 5 to 10 seconds. 
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7.2 The 25 -City Problem of Held and Karp 2 

Forty 3-opt tours were generated with reduction in sets of 10. The 
reduction procedure reduced the size of the problem from 25 to 10 to 7 
and 7. The optimal tour (cost 1711) appeared 26 times out of 40. Total 
computation time was 5.24 seconds. It is interesting to note that there 
are 7-city problems produced as a result of our reduction for which 3-opt 
tours are not necessarily optimal. 

7.3 The 33-City Problem of Karg and Thompson* 

This 33-city problem seems very easy to solve by our methods. Fifty 
3-opt tours were generated with reduction in sets of 10. The reduction 
procedure reduced the size of the problem from 33 to 11, 11, 11, and 9. 
The optimal tour (cost 10,801) appeared 19 times out of 50 and the total 
computation time was 10.7 seconds. Figs. 2 and 3 illustrate some stages 
in the reduction process. The solid lines indicate links in the set I, and 
together with the broken lines, form the set S of all links found in the 
3-opt tours generated. 

7.4 The 42-City Problem of Dantzig, Fulkerson and Johnson 1 

A 42-city problem was solved by Dantzig, Fulkerson and Johnson 1 . 
The optimal tour has cost 099. Forty 3-opt tours were generated with 
the optimal tour appearing 1 1 times. Total computation time was 36.3 
seconds. The successive stages of reduction and other pertinent informa- 
tion obtained are given in Table I. Xote that d, the number of distinct 
3-opt tours obtained per round, decreases, indicating that for smaller 
problems, there are not too many distinct 3-opt tours and hence the 
probability that a 3-opt tour is indeed optimal is quite large. 

7.5 The 48-City Problem of Held and Karp 2 

In Ref. 2, Held and Karp obtained the "best" solution to this 48-city 
problem with cost = 11,470. D. W. Sweeney (private communication) 
later found a tour with cost = 11,461 We strongly conjecture that this 
is indeed the optimal tour and shall consider it as such for the purpose 
of our work. 

Statistics collected on numerous runs indicate that a 3-opt tour has a 
probability p ~ 0.05 of being optimal, with each 3-opt tour obtained in 
the average computation time of 2.80 seconds. 

Without the reduction procedure, we need to spend about 280 seconds 
to produce 100 3-opt tours if we want a probability of 0.99 of obtaining 
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Table I — 42-City Problem Summary 
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30 
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28.3 
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24 
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713 


699 
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32.2 


4 


24 


10 


4 


713 


699 


703.7 


4 


36.3 



SRR: Successive rounds of reduction. 

n: Number of cities in reduced problem. 
r: Number of 3-opt tours generated per round. 
cl: Number of distinct 3-opt tours obtained per round. 
C \f. Maximum tour cost (for the current round). 
C, n : Minimum tour cost (for the current round). 
C: Average tour cost (for the current round). 
/: Number of occurrences of the best tour in the current round. 
T: Total time of computation in seconds (accumulated). 

the optimal solution. With reduction, setting r = 10, we obtained the 
optimal solution 21 times in a total computation time of 63 seconds, as 
shown in Table II. When d drops below r/2, we consider the resulting 
reduction too binding in evaluating the heuristic probability p. For 
example, from this particular run, we count 4 out of 50 instead of 21 
out of 100 as the frequency of occurrences of the optimal tour. 







Table II- 


— A Typical 48-City Run 
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10 
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7.6 The 57 -City Problem of Karg and Thompson 4 

In lief. 4, Karg and Thompson introduced a 57-city problem and 
found by (heir method tours with costs of 12,986 and 12,985. In Ref. 5, 
Reiter and Sherman developed a family of algorithms and found two 
tours which are better with costs 12,955 and 12,967.* Our program also 

* The next best tour we obtained is one with cost 12,966. We believe this to be 
the same tour and the difference due to a discrepancy of 1 unit in the distance of 
one particular link used in the tour. Our distance matrix was obtained from 
Ref. 4. 
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produced the tour with cost 12,955 which we conjecture to be the opti- 
mal solution- 
Statistics collected on more than 1000 3-opt tours indicate a proba- 
bility « 0.02 for a 3-opt tour to be optimal. 

A typical run of 100 cases with reduction in sets of 10 appears in Table 

III. 

A few highlights in the reduction process are illustrated in Figs. 4 

and 5 below. 

An example of an "unfortunate" run where a link not in the optimal 
tour is committed in the early stages of reduction is shown in Table IV. 
Note that improper reduction appears in round 3 and as a consequence 
the subsequent values of d drop sharply. For the purpose of counting 
the occurrences of the optimal tour, only the first 3 rounds are con- 
sidered (subsequent rounds have d < r/2) giving us out of 30 for 
this run. As can be seen, we do no worse than to obtain the best tour 
obtained by Karg and Thompson in Ref. 4. Furthermore, the computa- 
tion time involved in the first round usually exceeds 40 percent of the 
total computation time so that even when improper reduction happens, 
the total computation time is still less than that for obtaining 30 3- 
opt tours without reduction. 

7.7 A 105-City Problem 

To test the effectiveness of our method, a 105-city problem was con- 
structed from the 48-city problem and 57-city problem using the facts 
that w 30,33 is the largest link (cost 669) in the best tour for the 48-city 
problem and « 4 o,4 6 (cost 685) is the largest link in the best tour for the 
57-city problem. Thus, city 30 of the 48-city problem was connected to 
city 40 of the 57-city problem by a link with cost 10; similarly, city 33 
of the 48-city problem was connected to city 46 of the 57-city problem 
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119.8 


8 


33 


10 


8 


13,346 


12,955 


13,122.3 




129.3 


9 


33 


10 


8 


13,300 


12,955 


13,132.0 




138.8 


10 


33 


10 


8 


13,473 


12,985 


13,156.6 




149.2 
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Table IV — A 57-City Run With Improper Reduction 



SRR 


n 


r 


d 


Cm 


C„ 


C 


I 


T 


1 


57 


10 


10 


13,741 


12,993 


13,430.7 


1 


45.7 


2 


45 


10 


10 


13,416 


12,986 


13,123.0 


1 


73.4 


3* 


37 


10 


9 


13,197 


12,997 


13,091.7 


1 


87.2 


4 


30 


10 


4 


13,114 


12,985 


13,001.8 


3 


94.2 


5 


23 


10 


4 


13,012 


12,985 


12,991.7 


2 


97.4 


6 


22 


10 


3 


12,997 


12,985 


12,990.2 


2 


100.2 


7 


21 


10 


2 


12,980 


12,985 


12,985.2 


8 


102.7 


8 


17 


10 


2 


12,980 


12,985 


12,985.7 


3 


104.2 


g 


17 


10 


2 


12,980 


12,985 


12,985.0 


4 


105.9 


10 


17 


10 


•) 


12,980 


12,985 


12,985.7 


3 


107.2 



* Improper reduction appears in this round. 



by a link of cost 10. All other links between the cities in the 48-city 
problem and the 57-city problem were assigned random costs varying 
from 68(5 lo 750, while links between the cities in the same problem re- 
main unchanged. We purposely made those link costs moderate com- 
pared to big links in each of the two problems in order to induce suffi- 
cient mixing when a random tour is reduced to a 3-opt tour. From the 
method of constructing the problem there is a tour (the conjectured 
best tour) for this 105-city problem for which the cost is (12,955 + 
11,461 + 20) - (669 + 685) or 23,082. Since the probability of ob- 
taining the tours with costs 12,955 and 11,461 are ?^0.02 and 0.05, 
respectively, we expect that the probability of a 3-opt tour being opti- 
mal in this 105-city problem is less than 0.001. Computation time per 
3-opt tour without reduction is about 35 seconds. A run of 20 3-opt 
tours was made Avith one round of reduction which reduced the number 
of cities from 105 to 81. Total computation time for the 20 3-opt tours 
was 476 seconds for an average of 23.8 seconds per local optimum. 
Although we did not obtain the conjectured best solution the 3-opt tour 
costs are surprisingly good; the worst being 24,581 and the best 23,096. 
An interesting fact about the 3-opt tours obtained is that besides the 
two short links bridging the two problems, at least one other pair of 
links connecting two cities in different problems appear in all the 3-opt 
tours obtained, indicating that this 105-city problem has a structure by 
itself and is not merely the conjunction of two separate problems. This 
is indeed what we intended it to be. 

This we believe is the largest traveling salesman problem ever at- 
tempted. Reasonable estimates indicate that we may be able to solve 
a problem of this size with the reduction technique in 100 minutes* 
with a probability of success >0.5. 

* See Appendix C. 
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7.8 Other Experiments 

Experiments with 2-opt tours were moderately successful for smaller 
problems. A 20-city, 2-opt tour may be obtained in approximately 0.048 
seconds with a probability of being optimum £t:0.0G. The decrease in 
time compared with 3-opt tours is by a factor of 5. The decrease in 
probability shows that our 3-opt procedure may still be the best. When 
the number of cities becomes larger, the decrease in probability is so 
sharp as to make runs with the 2-opt procedure undesirable. 

Similar experiments with programs to produce 4-opt tours show an 
increase of computation time per local optimum by a factor of 0.8n,* 
while the chances of obtaining the optimal solution are not noticeably 
increased. Practically all 3-opt tours are 4-opt and the ranges of tour 
costs are not noticeably improved. 

VIII. CONCLUSIONS AND DISCUSSION 

As mentioned earlier, the methods we used here in solving the traveling 
salesman problem were based upon heuristic principles believed to be 
of general applicability to various optimization problems. These may be 
roughly summarized as follows: 

8.1 Probabilistic Approach vs Deterministic Approach 

In dealing with problems similar to a large traveling salesman problem, 
where a really efficient algorithm for the best solution is unavailable, it 
is in general time consuming, if not entirely hopeless, to work on re- 
finement techniques to obtain the best solution. Rather, the approach 
should be to develop a technique by which good locally optimal solutions 
can be obtained very fast, and with reasonable probability that among 
the locally optimal solutions, we may indeed find the best. (In actual 
applications, the best of a set of good locally optimal solutions, even 
though it may not be the best, will be close enough to the best solution 
as to offer a satisfactory answer for most problems.) The fact that we 
generate for a given traveling salesman problem a large number of 3-opt 
tours rather than develop means of further improving a 3-opt tour is 
based on this principle. The fast computation gives us the ability to 
generate many locally optimal solutions which, coupled with a reasonable 
probability that a locally optimal solution is optimal, guarantees us a 
very high probability of success for solving the problem. 

* Although the ratio of ( % . J and ("J is ^—^ — , for each 4 links removed, there 

are 48 ways of putting the 4 strings together, compared with 8 ways of putting 3 
strings together. 
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8.2 Choice of Algorithm 

Consider two algorithms Ai and A 2 which will produce locally optimal 
solutions for a given problem in computation times h and t 2 . Suppose 
Ai is "stronger" then A 2 in the sense that Ai produces locally optimal 
solutions which are optimal more frequently than A 2 , say Ai with 
probability pi and A 2 with probability p 2 and pi > p 2 . Ai need not be 
preferred to A 2 if h is disproportionally greater than t 2 . This can be seen 
as follows: for a given problem, suppose we are permitted a total compu- 
tation time t, (which may be the amount of computing time we are able 
to buy with available funds), so that in time t, we can perform [t/U] = la 
experiments with algorithm A , . Then the probability that among the 
hi locally optimal solutions obtained, we indeed have the optimal solu- 
tion is p^ = 1 — (1 — pi)**. This is the quantity we should maximize. 
In the event that we are interested in only good approximate solutions, 
we should choose an algorithm A\ over another algorithm A 2 such that 
for given amount of time t, the best of the k\ locally optimal solutions 
relative to Ai is better than the best of the k 2 locally optimal solutions 
relative to A 2 . 

As an example, consider the sequence of algorithms Ax to produce 
X-opt tours with associated probabilities p\ and computing times t\ • 
For the range of the size of problem we are dealing with, say from 10 
to 100 cities, we have reason to believe that p 3 * is the largest among the 
Px*'s, as indicated by our computation results, and that the best tour 
produced from k 3 -opt tours is at least as good if not better than the 
best X-opt tours produced in comparable time using any other A\ . 

8.3 Random Improvement vs Steepest Descent 

Within the algorithm for obtaining a locally optimal solution, sub- 
stantial saving in time can be achieved by not attempting to find the 
best improvement possible at any stage, but rather to take the first 
improvement that occurs. In general, the method of steepest descent 
tends to increase the computation time disproportionally and should 
not be used. Attention should be directed to finding improvements with 
a minimum amount of computation rather than to making the maximum 
improvement possible at each step. 

8.4 Restricting Search in Increasingly Smaller Domain 

When sufficient information has been gathered about the problem, 
ways and means should be investigated to restrict substantially the 
domain of search. This should be done even with the possibility that 
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the optimal solution may be lost in the process (of course only with 
small probability). This is well illustrated by our reduction procedure 
described in Section VII. 
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APPENDIX A 

k-Length String Optimization 

In the first algorithm, called /c-length string optimization, a dynamic 
programming technique is used to optimize tours (or strings) of A; 
cities for A; ^ 13.* The algorithm is a slight modification of that given 
in Ref. 2 by Held and Karp. However, we achieve a significant reduction 
in computation time by taking advantage of the fact that the distance 
matrix is symmetric. 

Let the k cities be represented by the integers 1 through k, X = 
{2, 3, • • • , k} be the set of integers from 2 to k and S be a subset of X 
consisting of m elements; i.e., \S\ = m. Let C(S,i) with i e S denote 
the minimum cost of starting from city 1 and visiting all cities j in S, 
terminating at city i. Then the quantities C(S,i) can be computed 
recursively as follows: 

C({i\,i) = du (1) 

C(S,i) = min [C(# - i,j) + djl (2) 

jeS-i 

For example, if S = {3,5,9,11}, then C(S,9) = cost of best string 
from 1 to 9 through 3, 5, 11 

\C(S - 9,3) + d 3 . 9 
= min] C(S - 9,5) +<*5.9 
[C(S - 9,11) +dii l9 . 

We note here that S — i is a subset of X such that \ S — i\ = \S\ — 1, 
and thus the quantities C(S — i,j) have all been computed a step before 
in the recursion scheme. 

For k = 2t + 1, we recursively compute and store C(S,i) for all 

* Storage requirements in dynamic programming seriously limit the size of 
the problem that can be handled. 
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subsets S of X for | S \ = 1 up to | S \ = t. Then we successively com- 
pute C(S,i) for | 8 | «= t + 1. At this point, if we denote the complement 
of these S's in X by S, we see that S* = S U {«'} is a subset of t ele- 
ments and C(S*,i) has already been computed. The cost r of the optimal 
tour T is therefore given by 

r = min [min [C(S,i) + C'(5*,i)]] 

3 icS 

where S ranges over all subsets of X containing t + 1 elements. Since 
either S or S* must contain say city 2, we may further restrict the range 
of S to only those containing the city 2. 

For k = It the procedure is similar except that we need not compute 
C(S,i) for | 8 | > t. 

The order of the cities in the tour T can now be determined. With the 
"middle" city i and sets S,S* determined from the expression for r, 
we find a city i\ in 8 — i such that C(S — i,ii) + d,-,-, = C(S,i) ; then 
a city ii in S — {i,ii\ such that C(S — {v'i},i 2 ) + d ilit = C(S — i,ii) 
and so on until S is exhausted, and similarly for the set S* to produce 
the tour T = 1, • ■ • , i% , i\ , i, i\*, ii*, •••,!.. 

This algorithm can be used, with a slightly modified distance matrix, 
to find the minimum string from city 1 to city k through the cities 
2, 3, • • • , /o — 1 and its associated cost C{X,k). Wesetrfu- = C({k},k) = 
and C(S,k) = <*> for \ S \ ^ 2 in the recursive computation scheme 
described above. This insures that city k is next to city 1 in the tour 
produced, and hence by removing the link from city 1 to city k we get 
the best string with 1 as our initial city and k the terminal city. 

Given a w-city problem, a tour T through the n cities is said to be 
locally optimal relative to the A--length string optimization algorithm if 
every ordered set of k consecutive cities in T, say (« a +i , • • • , i a +k), 
subscripts reduced modulo n, is optimal as a string from i a+i to i a+k 
going through the cities i a +2 , i a +3 , ■ ■ ■ , i a +k-\ • The above program 
may be used to produce locally optimal tours of this type for any n-city 
problem without change as follows: Consider a random initial tour 
represented by the permutation P — (»i , it , • • • , in). We map the first 
k cities t'i , t» , • • • , it onto 1,2, • • • , k and use the program to find the 
best string from 1 to k, say 1, Ja , Jb > ■ • ■ ,k. The associated permutation 
P* = (»i , ih , i» , • • • , ik , • ■ • , in) gives us a tour whose cost is no 
larger than P. When there is no gain in cost from P to P* it means that 
i\ , it , • • • i ik is already optimal as a string. Next, we rotate P* by a 
length 5 (relatively prime to n) and repeat the process until there are 
n consecutive rotations without a decrease in cost. Then every ordered 
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set of k consecutive cities in the final permutation is now optimal as a 
string. Experiments with this procedure indicated, however, that it is 
time consuming and not nearly as effective as the second algorithm 
which we have discussed in the paper. 

APPENDIX B 

Outline of Computer Program to Produce 3-Opt Tours from Random 
Initial Tours 

Notation: 

n number of cities 

(da) distance matrix 

r number of 3-opt tours desired before reduction 

Uij link connecting city i and city j 

ti ith. city in the tour 

S the union of the set of links found in previously generated 

3-opt tours 
q a program branching parameter. 

1. S = 

2. Do through (14), m = 1, 1, r. 

3. Generate a random tour (h , t 2 , • • • , t n ). 

4. q — if m = 1, otherwise q = 1. 

5. Do through (12), count = 1, 1, n. 

6. If q = 0, skip 7. 

7. If u h t n e S go to 12 (special feature). 

8. Do through (11), k = 1, 1, n - 3. 

9. Do through (11), j = k + 1, 1, n - 1. 

10. If d tk t j+1 + d tltj ^ d tlh+l + d tktj set d = d tktj+l + d hlj and a = 16, 
otherwise set d = d tl t i+l + d lltlj and a = 18. 

11. If d + d, k+l , tn (cost of links added) < d, xtn + d tk t M + d tj t i+l 
(cost of links removed) go to a (otherwise loop). 

12. (h , It , ■•■ ,t n ) = (t n , h, ••■ , t n+1 ) (rotate). 

13. If q = 1, set q = and go to 5 (almost 3-opt tour obtained). 

14. S = S U links in (h , U , ■ ■ ■ , t„) (3-opt tour obtained). 

15. Go to reduction (see description in Section VI). 

16. (tl , h , • • • , t„) = (tj + 2 , ■ ■ • , t n , k-+i , • ■ ■ , tj , ti , • • • , t)t | tj+i) 

(links exchanged and tour perturbed) . 

17. Go to 5 (treat improved tour as initial tour). 

18. (h , h , ■ ■ • , t n ) = (tj+2 , • ■ • , t n , fc+l , • • • , tj , tk , ■ ■ ■ , h , tj+l). 

19. Go to 5. 
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APPENDIX C 



Estimates on the Probability that a 3-Opt Tour is Optimal 

From the statistics collected on running many problems, we estimate 
that a 3-opt tour in a 10-city problem of some difficulty has a proba- 
bility of 0.5 of being optimal, and in general, this probability seems to 
decrease by a factor of 2 for each addition of 10 cities. We shall denote 
these estimates by pin) and use them as basis for our calculations. We 
have 

p(n) & 2- n/10 . 

So far, these estimates has been close for problems in the range between 
30 and (50 cities and is too conservative for smaller problems, or excep- 
tionally easy problems like the 33-city problem. For exceptionally 
difficult problems, we define p*(n) = \p(ri) as the estimate for the 
"worst." Computation time t{n) per 3-opt tour in an n-city problem 
averages 30n 3 microseconds without reduction. With reduction, we can 
obtain 100 3-opt tours usually in the amount of time needed for 25 3-opt 
tours without reduction. 

A few examples will show how to estimate time needed to "solve" 
a given problem. We consider a problem solved if the probability p of 
obtaining the optimal solution is ^0.99. It should be noted that the 
estimates are heuristic in nature and tend to be on the conservative 
side. 

Example 1 — Given a 20-city problem, we have p(20) = j, £(20) = 
0.2-4 second. In order to have 

(f)* g 0.01 

we must have k > 10. Thus 4 seconds of computation should be ade- 
quate. In the "worst" case p*(20) — -fa, k — 72 should be adequate. 
Computation time is about 18 seconds if reduction is not used and 
about 5 seconds if reduction is used. 

Example 2 — For a 40-city problem we have p(40) = ye, £(20) « 2 
seconds. With /«• = 72, we need 144 seconds without reduction and 
about 40 seconds with reduction. In the "worst" case p*(40) = ^ 
and k = 300 is sufficient. Running the program in 3 independent runs 
of 100 with reduction, total computation time needed is about 2.5 
minutes. 

Example 3— For a G0-city problem, we have p(60) = -fc, £(60) » 6.5 
seconds. With k = 300 and using reduction, about 8 minutes of computa- 
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tion time is required to guarantee p > 0.99. In the "worst" case with 
p*(00) = -on, the same computation will yield p PH 0.7. 

Example 4 — For a 100-city problem p(100) = Tinnr and t(100) « 30 
seconds. With A- = 800 we have p fH 0.54. With reduction, this can be 
achieved in about 100 minutes. 

APPENDIX D 

Two Conjectures 

Using the notation of Section III, the following appears to be an inter- 
esting problem: Find the minimum number A; for which C k = C*+i = 
■ • • = C n . For fairly large k it appears, at least intuitively, that a k- 
optimal tour should be optimal, since by Theorem 1 (Section III), if it is 
not optimal, then its index can be at most n — k — \. Due to the in- 
trinsic difficulties in this problem we can only state the following con- 
jectures: 

Conjecture 1 — C„_i = C» . That is, any tour which is (n — l)-optimal 
is also optimal. 
Conjecture 2 — Ck = CV+i — > Ck — Ck+i = • •• = C „ . 

Conjecture 1 can be verified for n ^ 6. Also an (n - l)-optimal tour 
which is not optimal must have index 0, hence all of its links are inad- 
missible. Furthermore, it must also be w-length string optimal. The 
existence of such a tour seems to be extremely unlikely. Work on Con- 
jecture 1 has led to the following interesting problem in graph theory, 
which is equivalent to Conjecture 1, and yet involves no concept of any 
distance matrix. 

Problem: Suppose we are given a graph with n vertices and 2n links 
which can be partitioned into 2 sets of n links, each of which form a 
Hamiltonian circuit. Does there exist another partition with the same 
property? 

If the answer to the above problem is always in the affirmative, then 
we can prove Conjecture 1 in the following way. Consider a graph 
consisting of an optimal tour T and an (n — 1) -optimal tour T* which 
is not optimal. Since T* has index 0, T, and T* have no link in common 
and thus serve as a partition into 2 sets of n links, each of which form a 
Hamiltonian circuit. Let the other partition with the same property be 
tours 7\ and T 2 . Since Ti and T 2 uses the same set of 2n links of T 
and T*, 

C(T0 + C(T 2 ) = C{T) + C(T*) < 2C(T*). 
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Hence one of the tours, say Tx must have cost C(7\) < C(T*). But T t 
and T* must have at least one link in common; hence T* cannot be 
{n — 1) -optimal. 

On the other hand, suppose there is a graph with n vertices and 2n 
links which can be partitioned into 2 sets of n links A and B, each of 
which form a Hamiltonian circuit and that no other partition with the 
same property is possible. Let the n vertices represent n cities. We 
construct a distance matrix as follows: let each link in A be assigned a 
distance d a ; each of the n — 1 links of B be assigned a distance db and 
the remaining link in B, d. Let all other links in the complete graph of n 
nodes have distance > max [n-d a , (n — l)d& + d]. Suppose nd a < 
(n — \)d h + d. Then it is clear that the set of n links in A form the 
optimal tour while the set of n links in B form an (n — l)-optimal tour. 
Furthermore, we can make d a > db so that the (n — 1) -optimal tour 
which is not optimal contains n — 1 smallest links, and by making d 
large, we can also make the "next best" tour as poor as possible com- 
pared with the optimal tour. 

Conjecture 2 is obviously true for k = n — 1. We prove it is true for 
k = 1. By hypothesis, C\ = Co , that is, no tour crosses itself. Suppose 
there is a tour T which is not optimum, then we may transform T into 
the optimal solution by a sequence of transpositions of immediate 
neighbors, each step being equivalent to an inversion. Since the cost 
must finally reduce to the cost of the optimal tour, at some point we 
must have a tour with the property that transposing 2 immediate 
neighbors reduces the cost, giving us a crossover situation contradicting 
the hypothesis. 
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